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Abstract 

In this study we address the problem of answering queries over a peer-to-peer system of taxonomy- 
based sources. A taxonomy states subsumption relationships between negation-free DNF formulas on 
terms and negation-free conjunctions of terms. To the end of laying the foundations of our study, we first 
consider the centralized case, deriving the complexity of the decision problem and of query evaluation. 
We conclude by presenting an algorithm that is efficient in data complexity and is based on hypergraphs. 
More expressive forms of taxonomies are also investigated which however lead to intractability. We then 
move to the distributed case, and introduce a logical model of a network of taxonomy-based sources. On 
such network, a distributed version of the centralized algorithm is then presented, based on a message 
passing paradigm, and its correctness is proved. We finally discuss optimization issues, and relate our 
work to the literature. 



1 Introduction 

Consider a tetrad (T, ^, Obj, I) where T is a set of terms, ^ is a subsumption relation over concepts expressed 
using T (e.g. (Animal A FlyingObject) V Penguin ^ Bird), Obj is a set of objects and / is a function from 
T to ViObj), assigning a description {i.e., a set of terms) to each object. Now assume that all these are not 
stored at a single place but they are distributed over a set A/" = {Si, . . . , 5„} of independent peers. Moreover 
assume that each peer Si can have zero, one or more ^-relationships between its terms (i.e. T;) and some 
concepts over the terminologies of other peers (e.g. Parrot j ^ BirdSi and Animal^ A Flyings :< BirdSi). 
In this paper we address the problem of answering Boolean queries over this kind of systems. 

Some parts of the work reported in this paper have been already published. Namely, |40j presents a 
first model of a network of articulated sources, while [321 studies query evaluation on taxonomies includ- 
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ing only tcrm-to-tcrm subsumption relationships. Finally, [30j presents a procedure for evaluating queries 
over centralized sources supporting term-to-query subsumption relationships, as well as hardness results for 
extensions. In this paper, 

- we consider from the start the most complex type of subsumption for which we can propose an efficient 
query evaluation procedure, allowing subsumption relationships between negation-free DNF combina- 
tions of terms and negation-free conjunctions of terms. We then place the hardness results presented 
in [30] in context, thus showing that any Boolean extension of the expressive power of subsumption 
leads to intractability of the query answering problem; 

- we ground the centralized query evaluation procedure for this kind of sources, presented in [30] . on 
solid theoretical basis, proving its correctness, and linking it to the existing algorithmic and complexity 
literature; 

- we present a distributed query evaluation procedure, based on a functional model of a peer; correctness 
and complexity of this procedure are given; 

- we describe optimization techniques that can be used for improving the efhciency of query evaluation; 

- we relate our work to the existing literature on peer-to-peer systems. 

The paper is structured as follows: Section[5]gives the background on peer-to-peer systems, while Section[3] 
introduces sources, presenting the centralized query evaluation procedure. Networks of sources are considered 
in Section [3J where our algorithm for query evaluation on networks is presented, and Section [5] discusses 
optimization issues. Section [H] compares our work with related work and Section [7] concludes the paper. 

2 Background 

A peer-to-peer (P2P) system is a distributed system in which participants (the peers) rely on one another for 
service, rather than solely relying on dedicated and often centralized servers. The most popular P2P systems 
have focused on specific application domains like music file sharing [3l[TJ[2]) or on providing file-system-like 
capabilities [5]. In most of the cases, these systems do not provide semantic-based retrieval services as the 
name of an object (e.g. the title of a music file) is the only means for describing the contents of an object. 

Semantic-based retrieval in P2P systems is a great challenge that raises questions about data models, 
conceptual modeling, query languages, algorithms and data structures for query evaluation, and techniques 
for dynamic schema mapping. Roughly, the language that can be used for indexing the objects of the 
domain and for formulating semantic-based queries, can be free (e.g natural language) or controlled, i.e. 
object descriptions and queries may have to conform to a specific vocabulary and syntax. The former case, 
resembles distributed Information Retrieval (IR) systems and this approach is applicable in the case where 
the objects of the domain have a textual content (e.g. [^Hl HZl [TH [37] ) . In the latter case, the objects of a peer 
are indexed according to a specific conceptual model represented in a particular data model (e.g. relational, 
object-oriented, logic-based, etc), and content searches are formulated using a specific query language. Of 
course, a P2P system might impose a single conceptual model on all participants to enforce uniform, global 
access, but this will be too restrictive. Alternatively, a limited number of conceptual models may be allowed, 
so that traditional information mediation and integration techniques will likely apply (with the restriction 
that there is no central authority), e.g. see [32l [3T|. 

The case of fully heterogeneous conceptual models makes uniform global access extremely challenging 
and this is the focus of this paper. From a data modeling point of view several approaches for P2P systems 
have been proposed recently, including relational-based approaches [7], XML-based approaches [24] and 
RDF-based [3l]. 

In this paper we consider the fully heterogeneous conceptual model approach (where each peer can 
have its own schema), with the only restriction that each conceptual model is represented as a taxonomy. 
A taxonomy can range from a simple tree-structured hierarchy of terms, to the concept lattice derived 
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by Formal Concept Analysis [12] j or to the concept lattice of a Description Logics theory. Specifically, 
according to our model, each peer consists of a taxonomy, an object base, i.e. a database that contains 
descriptions of the objects according to the taxonomy, and a number of (one-way) articulations to some 
of the other peers of the network, where an articulation is actually a mapping between terms of the peer 
and terms (or queries) of other peers. Articulations aim at bridging the inevitable naming, granularity and 
contextual heterogeneities that may exist between the taxonomies of the peers (for some examples see [40]). 
For example, the taxonomy of a peer iSi could be the following: { Penguin ^ Animal, Pelican ^ Animal, 
Ostrich ^ Animal, (Animal A FlyingObject) V Penguin V Ostrich ^ Bird }. The object base of Si could 
be the following: { Ostricli(l), Bird(2), Animal(3), FlyingObject(3) }. Si could have an articulation 
to a peer ^2 like { IlLV"fKovivo<;2 Penguin, T\e\tK,avo(;2 Pelican }, an articulation to a peer 1S3 like 
{ Animale3 A Alat03 < Birds }, and an articulation to two peers 54,55 of the form: { (Fliegentier^) V 
(Animals A Volants) ^ (Animal A FlyingObject) }. 

The articulations can be exploited for finding objects in the network using content-based queries, for 
publishing objects and their descriptions to the network, and for obtaining more rich descriptions of the 
objects (by aggregating their descriptions according to different conceptual models)0. Apart from determin- 
ing query propagation, these mappings are actually used for translating the query into a vocabulary that 
the recipient can understand (and thus answer). In certain cases, these inter-taxonomy mappings could be 
constructed automatically (e.g. using the data-driven method proposed in [35]). 

The placement of our work with respect to other logic-founded approaches for query evaluation over P2P 
systems is given in Section [HI 



3 Information sources 



This Section defines information sources and derives algorithms and complexity results for querying them. 
These results will be applied later, upon studying networks of sources. The model is first introduced; the 
computational and algorithmic foundations of the query evaluation problem are then given; Section 13.31 
presents an efficient query evaluation method. Finally, three extensions of information sources are discussed: 
those having negation in the taxonomy, those having negation only in the query language, and those having 
disjunction in the taxonomy. For all these, the query evaluation problem is studied, deriving complexity 
results or correct and efficient algorithms, if any. 



3.1 The model 

The basic notion of the model is that of terminology: a terminology T is a non-empty set of terms. A 
terminology comes with an associated language for constructing more complex terms, called queries, from 
the given ones. 

Definition 1 (Query) The query language associated to a terminology T, Ct, is the language defined by 
the following grammar, where t is a term of T : 

q ::= d \ qW d 
d y-tltAd. 

An instance of q is called a query, while an instance of d is called a conjunctive query. Each d component of 
a query q is called disjunct of q. □ 

Terms and queries can be used for defining taxonomies. 

Definition 2 (Taxonomy) A taxonomy is a pair (T, ^) where T is a terminology and ^ is a binary relation 
between queries, ^ C (Ct x Ct), which is reflexive and transitive, such that q di q' and q ^ q' imply that g' 
is a conjunctive query. □ 

^ The latter is possible only if the objects have a unique global identity in the entire network (like URI for example). 
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If (q, q') £ ^, we say that q is subsumed by q' and we write q ^ q' . The reason for having only conjunctive 
queries as right-hand sides of non-trivial subsumption relationships is computational, and will be discussed 
later. 

Definition 3 (Interpretation) An interpretation for a terminology T is a pair {Obj,I), where Obj is a 
finite set of objects and / is a total function I : T ^ 'P{Ohi). □ 

Interpretations can be extended to queries in an intuitive way, thus defining the semantics of the query 
language: 

Definition 4 (Query extension) Given an interpretation / of a terminology T and a query q G £t, the 
extension of q in I, q^ , is defined as follows: 

1. {qydY = q' \J d' 

2. {d A ty = d' t' 

3. t^ = I{t). □ 

Since the function is an extension of the interpretation function /, we will simplify notation and will 
write I{q) in place of the formally correct q^ . We can now define a taxonomy-based source, called information 
source or simply source. 

Definition 5 (Information source) An information source 5' is a 4-tuple S = (T5, ^5, Obj g, Is), where 
{Ts, dis) is a taxonomy and {Obj Is) is an interpretation for Tg. □ 

When no ambiguity will arise, we will simplify notation by omitting the subscript in the components of 
sources. In addition, an interpretation will be equated with its interpretation function /. Given a source 
S = (T, ^, Obj, I) and an object o G Obj, the index of o in S, inds{o), is given by the terms in whose 
interpretation o belongs, i.e.: 

inds{o) = {t e T I o e I{t)}. 
Some interpretations better reflect the semantics of subsumption. 

Definition 6 (Models of a source) Given two interpretations /, /' of the same terminology T, 

- / is a model of the taxonomy (T, ^) if <z ^ q' implies I{q) C I{q'); 

- I is smaller than /', / < /', if I{t) C /'(t) for each term t £ T; 

- / is a model of a source S = (T, ^, Obj, I') if it is a model of (T, ^) and I' < I. □ 

The notion of model of a source can be used to obtain a simpler, but equivalent, notion of source, in 
which (non-trivial) subsumption relationships relate conjunctive queries to terms. The equivalence is based 
on the observation that the propositional formula: 

(CiV...VC„)^(tiA...At™) 

where each Ci in the left hand-side is any propositional formula, is logically equivalent to the formula: 

(Ci ^ ti) A (Ci ^ ta) A . . . A (Ci ^ i™) A . . . A (C„ ^ ti) A (C„ ^ ta) A . . . A (C„ ^ t„0, 

that is, the two formulae have the same models. Formally, the simplification of a taxonomy (T, <), is the 
taxonomy {T, a {■<)), where a{-<) is the reflexive and transitive closure of the following relatioio: 

{{C,t) I (Ci V...VC„,tiA...At„0 Ce {Ci,...,C„}, t£{ti,...,t^}}. 



^The transitive reduction of a binary relation i? on a set X, is defined as 1171 = Ri \ JJf , whore Ri = R\ {(a, a) | a £ X} 
and = Ri o Ri . In practice, ij'' is R without reflexive and transitive relationships, and its graphical rendering is generally 
known as the Hasse diagram of R. 
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Correspondingly, the simplification of a source S = (T, ^, Obj,I) is the source cr{S) ~ {T,(j{^), Obj,I). It 
is not difficult to see that: 

Proposition 1 J is a model of a source S if and only if it is a model of <7{S). □ 

Based on the last Proposition, from now on we will use the terms "taxonomy" and "source" as synonyms of 
"simplified taxonomy" and "simplified source", respectively. Formally, (T, ^) and S will stand for (T, ct(^)) 
and o'(S'), respectively. 

A second usage of the notion of model is to define the query-answering function ans on sources. 

Definition 7 (Answer) Given a source 5* = {T,^, Obj,I) and a query q G £t, the answer of q in S, 
ans{q, S), is given by ans{q, S) = {o <E Obj \ o G J{iq) for all models J of S'}. □ 

Indeed, we only need to consider term queries, because non-term queries can be embedded in the taxon- 
omy. Specifically: 

Proposition 2 For all sources S = (T, ^, 06j, /) and non-term queries q G £t, let tq and 

{di^Y = U{(ti A . . . A ii A . . . A im is a disjunct of 5} 

I" = /U{(t„0)}. 

Then, ans{q, S) = ans{tq, S^) where 5' = (T^, ^9, Obj, I'>). □ 

In practice, the terminology includes one additional term tq, which has an empty interpretation and 
subsumes each query disjunct ti A . . . A tm of q. The size of S"^ is clearly polynomial in the size of S and q. 

In light of the last Proposition, the problem of query evaluation amounts to determine ans{t, S) for given 
term t and source 5*, while the corresponding decision problem consists in checking whether o G ans{t,S), 
for a given object o. 

Query evaluation is strictly related to the unique minimal model of a source. 

Proposition 3 For all sources S = (T, ^, Obj, I) and terms t € T, the unique minimal model of S, I, is 
given by 

Ht) = U^^(") \u€T, u^t} U [jUiq) \q^t, q^tiA...At^, m> 1}. 
Moreover, ans{t, S) = I{t). □ 

3.2 Foundations 

In this Section, we consider the computational foundations of query evaluation, starting from those of the 
more fundamental decision problem. 

3.2.1 The decision problem 

Given a source S — (T, ^, Obj, I), o G Obj, and t G T, the decision problem o G ans{t, S) is P-complete in 
the size of the taxonomy. The hardness part of the proof is based on the following polynomial time reduction 
from the decision problem P ^ ^ in propositional datalog, known to be P-complete pTSj : 

- the terminology T is given by the letters occurring in P; 
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^ is the reflexive and transitive closure of the binary relation, defined as follows: 



{{ti A... At™,<o) I to e P}- 




It is easy to see that P |= ^ if and only if 1 e ans{A, {T, ^, Obj,I)), thus obtaining hardness in the size of 
the program P. 

For the membership part of the proof, we rely on an opposite reduction, which will also be used later. 
Let S = {T, ^, Obj, I) be a source, o G Obj and t G T. Define Ps to be the following prepositional datalog 
program: 



The size of Ps is polynomial in the size of the taxonomy. It is easy to see that: 

Lemma 1 For all sources S = (T, ^, Obj, I), o E Obj and t £ T, o £ ans{t, S) iff Ps is unsatisfiable. □ 

This proves the membership of the decision problem in P, hence its P-completeness. From the P-completeness 
in the size of the taxonomy of the decision problem, the P-completeness in the size of the information sourc(|^ 
of the query evaluation problem follows. 

From an algorithmic point of view, the decision problem relies on directed B-hypergraphs, which are 
introduced next. We will mainly use definitions and results from |21j . 

A directed hypergraph is a pair Ti. = (V,5), where V = {vi,V2, ■ ■ ■ ji'n} is the set of vertices and £ = 
{El, E2, ■ ■ ■ , E^} is the set of directed hypcredges, where E^ — {T{Ei),x{Ei)) with T{Ei),x{,Ei) C V for 
1 < i < m. T{Ei) is said to be the tail of Ei, while x{Pi) is said to be the head of Ei. A directed B- 
hypergraph (or simply B-graph) is a directed hypergraph, where the head of each hyperedge Ei, denoted as 
h{Ei), is a single vertex. 

A taxonomy can naturally be represented as a B-graph whose hypcredges represent one-to-one the sub- 
sumption relationships of the transitive reduction of the taxonomy. In particular, the taxonomy B-graph of 
a taxonomy (T, ^) is the B-graph H = {T, where 



where: s S T{Ei^), h{Ei^) = t and h{Ei._^) = Vj S T{Ei.) for 2 < j < g. If Pst exists, t is said to be 
connected to s. If t G T{Ei-^), Pst is said to be a cycle; if all hyperedges in Pgt are distinct, P^t is said to be 
simple. A simple path is elementary if all its vertices are distinct. 

A B-path TTst in a B-graph Ti. = (V, £) is a minimal (with respect to deletion of vertices and hypcredges) 
hypergraph H-k = (VjrjfTr), such that: 

^ The size of an information source comprises the size of its taxonomy and the size of its interpretation, i.e., what is called 
combined complexity, in the database literature. 



Ps = CsUlsU Qs 



where 



Cs 
Is 
Qs 



{to^ti,...,t„i I (ti A...At,„,io) G^''} 

{u <— I u G inds{o)} 

{^t} 



£^ = {{{ti,...,tm},u) I (ti A...Ai,„,w) G^'^} 
Figure [1] left presents a taxonomy, whose B-graph is shown in the same Figure right. 
A path Pst of length g in a B-graph H ~ (V, £) is a sequence of nodes and hyperedges 

Pst ^ {S ^ Vl, Ei^,V2, Ei.^, . . . , Ei^ , Vq+l = t) 
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Figure 1: A taxonomy and its B-graph 
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Figure 2: An object graph 
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1. C £ 

2. C V. 

3. x e and x ^ s imply that x is connected to s in Ti^r by means of a cycle- free simple path. 

Vertex y is said to be B-connected to vertex a; if a B-path ir^y exists in Ti. 

B-graphs and satisfiability of propositional Horn clauses are strictly related. The B-graph associated to 
a set of Horn clauses has 3 types of directed hyperedgcs to represent each clause: 

- the clause p <— gi A 92 A . . . A is represented by the hyperedge ({qi, q2, ■ ■ ■ , qs},p)', 

- the clause A (72 A . . . A is represented by the hyperedge ({qi, q2, ■ ■ ■ , Qs}, false); 

- the clause p ^ is represented by the hyperedge {{true},p). 

The following result is well-known: 

Proposition 4 ( |21] ) A set of propositional Horn clauses is satisfiable if and only if in the associated 
B-graph, false is not B-connected to true. □ 

We now proceed to show the role played by B-conncction in query evaluation. For a source S = (T, ^ 
, Ohj,I) and an object o G Obj, the object decision graph (simply the object graph) is the B-graph TLo = 
(T, £0), where 

£0 = £:<yj[j{{{true},u) \ u e inds{o)}. 

Figure [2] presents the object graph for the taxonomy shown in Figure[T]and an object o such that inds{o) ~ 
{cl,c2,c3}. 

We can now prove: 

Proposition 5 For all sources S ~ (T, ^, Obj,I), terms t € T, and objects o G Obj, a G ans{t,S) iff t is 
B-connectcd to true in the object graph Tio- 
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Proof: From Lemma{^ a G ans{t,S) ijf Ps is unsatisfiahle iff (by Proposition^ false is B-connected to true 
in the associated B-graph. By construction, TLo is the B-graph associated to Ps, where t plays the role of 
false. □ 



3.2.2 Foundation of query evaluation 

The basic reason why the decision problem can be efficiently solved, is that it requires traversing any 
hyperedge of the taxonomy B-graph at most once. In other words, when deciding membership of an object 
to a query answer, any (non-trivial) subsumption relationship needs to be used no more than once. However, 
this is not the case for query evaluation, for in this case all objects must be considered at once as potential 
candidates for the answer, and therefore a hyperedge can be traversed more than once, in different ways. 
From a more technical point of view, in deciding whether o G ans(t,S), we consider the cycle-free simple 
paths from any term in inds{o) to t. These paths make up the B-path Tio- Instead, in computing ans(t, S), 
we need to consider a much larger hypergraph, call it Tiotj, in which true is connected to all terms in T 
that belong to at least one object index. Tiobj is made up of all cycle-free simple paths from any term to t. 
Now, it is not difficult to see that these paths may be exponentially many in the size of the taxonomy. As 
an illustration, let us consider the taxonomy whose B-graph contains the following hyperedges: 

hi : {{ui,vi},U2) h2 : i{u2,V2},U3) /13 : ({^3, W3}, U4) /14 : ({"4, ^4}, U5) : {{u5,V5},t) 

gi : {{ui,Vi},V2) g2 ■■ {{U2,V2},V3) 93 ■ {{^3, V3} , V4,) ^4 : ({W4,W4},W5) 

Let us assume t is the query term. It is easy to verify that there are 2^ cycle-free simple paths connecting 
Ml to t, one for each sequence of the form 

{Ul fl X2 /2 X3 fs X4 fi X5 /15 t) 

where fi can be either hi (in which case x^+i is Ui+i) or gi (in which case x^+i is Wi+i) for 1 < i < 4. In fact, 
any object o whose index inds{o) contains either both Uj and Vj (for some 1 < J < 5) or t, is in the answer 
of the query, and so there is an exponential number of indices which qualify for the query. In order to avoid 
examining all these indices, a smart query evaluation algorithm could try to generate only the minimal ones, 
which in our case are just 6. However, finding all minimal qualifying indices is an NP hard problem. 

In proof, let us define an answer set A for a term query t to a source S = (T, ^, Obj, /), to be a set of 
terms ACT, such that if the index of an object o, inds{o), has all the terms in A, then o is an answer for t in 
S; formally, A C inds{o) implies o G ans{t, S). We now present a polynomial time reduction from MINIMAL 
HITTING SET, a problem known to be NP-complete, to the problem of finding a minimal answer set for 
t in S. We recall the notion of hitting set: Given a collection C of subsets of a set C, a hitting set for C is 
a set C" C C such that C" contains at least one element from each subset in C. The basic working of the 
reduction is exemplified in Figure [3l the left part of which shows the collection C, while the right part shows 
the corresponding taxonomy. The query is t. In general, letting C ~ {Ci, . . . , C^} be a collection of subsets 
of a set C, the corresponding source Sc ~ [Tc, 0i 0) and term query tc are defined as follows: 

- Tc = C U {t,ui, . . . , Uk} where t ^ C and Ui ^ C for <i <k. 

- ^c=Ui<j<fc{(a;,%) I XG Q}U{(ui A...Aufc,t)} 

- tc^t. 

It can be easily proved that this is a polynomial time reduction and, of course, it holds that <c is reficxive 
and transitive. Moreover, the terms from which each term Ui can be reached in the taxonomy B-graph are 
those of the i-th collection in C, plus the element Ui. Consequently, each hitting set for C contains a sub-term 
of each Ui, therefore it is an answer set for tc and Sc] in addition, the minimality of the former implies 
that of the latter. The converse is not true, because a minimal answer set X for tc and Sc may contain a 
"foreign" term Ui. However, this is harmless, for can be replaced in X by any of its sub-terms and the 
result is still a minimal hitting set for C. This proves the NP-hardncss. 
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Figure 3: A collection of sets C and the B-graph of the corresponding taxonomy (Tc, :<c) 



Notice that the reduction uses a much simpler type of information source than the one we consider in 
the present study, namely one whose taxonomy has only one hyperedge. Also, we have left the domain and 
the interpretation of Sc empty in order to stress that they play no role in the reduction. 

It is not difficult to prove membership of the problem in NP, from which the NP-completcness in the size 
of the taxonomy of finding one minimal answer set follows. However, query evaluation requires finding all 
minimal answer sets, thus the complexity of this latter problem is much worse, in fact we believe that it is 
PSPACE-complete. 

We now turn to the derivation of an algorithm for query evaluation, whose complexity is polynomial in 
the size of the information source (which may be exponentially higher than that of the taxonomy, of course). 

3.3 Query evaluation 

Proposition [3] does not directly lead to a simple method for query evaluation, as it may yield a recursive set 
of equations. As an illustration, let us consider the query 61 in our example source. We have: 



The standard datalog approach to solve this problem is to map the program into a system of equations on 
relations, which is then solved by applying an iterative method (see Chapter 13 of [5]). Given the simplified 
form of datalog programs that we are dealing with, we propose a simpler method to perform query evaluation, 
based on B-graphs. Our method relies on the following result, which is just a re-phrasing of Proposition O 

Corollary 1 For all sources S ~ [T, ^, 06j, /), o G Ohj and term queries t <E T, o ^ ans{t, S) if and only if 
either o G I{t) or there exists a hyperedge {{ui, . . . , u^}, t) G £^ such that o G p|{an,s(ui, S") | 1 < i < r}. □ 

This corollary simply "breaks down" Proposition [5] based on the distance between t and true in the object 
graph Tio- If o G I{t), then t G inds{o), hence there is a hyperedge (in fact, a simple arc) from true to t in 
Tio, which are 1 hyperedge distant from each other. If o ^ ^i^), then there are at least two hyperedges in 
between true and t. Let us assume that h is the one whose head is t. Since t is B-connected to true, each term 
Ui in the tail of h is B-connected to true. But this simply means, again by Proposition^ that o G ans{ui, S) 
for all the terms Ui, and so we have the Corollary. Notice that, by point 3 in the definition of B-path, t is 
connected to each Ui by a cycle-free simple path; this fact is used by the procedure Qe in order to correctly 
terminate in presence of loops in the taxonomy B-graph Ti.. 

The procedure Qe, presented in Figured! computes ans{t, S) for a given term t (and an implicitly given 
source S) by applying in a straightforward way Corollary [TJ To this end, Qe must be invoked as QE(t, {i}). 
The second input parameter of Qe is the set of terms on the path from t to the currently considered term x. 
This set is used to guarantee that t is connected to all terms considered in the recursion by a cycle-free simple 
path. Qe accumulates in R the result. The correctness of Qe can be established by just observing that, 
for all objects o G 06j, o is in the set R returned by QE(t, {t}) if and only if o satisfies the two conditions 
expressed by Corollary [TJ 



7(61) 
I{h\ A 63) 



/(61) U /(cl) U /(c2) U /(61 A 63) 
1(61) n 7(62). 
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QE(a:: : term ; A : set of terms); 

1. I{x) 

2. for each hyperedge ({ui, ...,Ur},x) in H. do 

3. if {ui,...,ur} n yl = then R\J (QE(tii, A U {ui}) n . . . n Q,¥.{ur,Ayjs {wr})) 

4. return(7?) 



Figure 4: The procedure Qe 



Table 1: Evaluation of QE(a2, {a2}) 



Call 


Result 


QE(a2,{a2}) 
QE(63,{a2,63}) 
QE(61,{a2,61}) 
QE(62,{a2,fe2}) 
QE(cl,{a2,61,cl}) 
QE(c2,{a2,61,c2}) 
QE(c2,{a2,62,c2}) 
QE(c3,{a2,62,c3})) 
QE(61,{a2,62,c2,W}) 
QE(63,{a2,62,c2,53})) 
QE(cl,{a2,62,c2,W,cl}) 


/(a2) U QE(&3,{a2,63}) U (Qe(&1, {a2, &1}) n Qe(62, {a2, 62})) 

m) 

I{hl) U QE(cl,{a2,61,cl}) U Qe(c2, {a2, 61, c2}) 
/(62) U (QE(c2,{a2,62,c2}) n Qe(c3, {a2, 62, c3})) 
/(cl) 
/(c2) * 

/(c2) U (QE(61,{a2,62,c2,61}) n Qe(63, {a2, 62, c2, 63})) 
/(c3) 

/(61) U QE(cl,{a2,62,c2,61,cl})* 

7(63) 

/(cl) 



As an example, let us consider the sequence of calls made by the procedure Qe in evaluating the query 
a2 in the example source, as shown in Table [TJ The calls marked with a * are those in which the test 
in Hue 3 gives a negative result. Upon evaluating Qe(c2, {a2, 61, c2}) the procedure reahzes that the only 
incoming hyperedge in c2 is ({61, 63}, c2), whose tail {61, 63} has a non-empty intersection with the current 
path {a2,61,c2}; so the hyperedge is ignored. In this case, the cycle (61,c2, 61) is detected and properly 
handled. Analogously, upon evaluating Qe(61, {a2, 62, c2, 61}), the cycle (c2, 61, c2) is detected and properly 
handled. Also notice the difference between the calls Qe(c2, {a2, 61, c2}) and Qe(c2, {a2, 62, c2}). The both 
concern c2, but in the former case, c2 is encountered upon descending along the path (a2, 61, c2) whose next 
hyperedge is ({61, 63}, c2); following that hyperedge, would lead the computation back to the node 61, which 
has already been met, thus the result of the call is just I{c2). In the latter case, c2 is encountered upon 
descending along the path (a2, 62, c2), thus the hyperedge leading to 61 and 63 must be followed, since none 
of the terms in its tail have been touched upon so far. 



From a complexity point of view, Qe visits all terms that lie on a cycle-free simple path ending at the 
query term t in the taxonomy B-graph Ti. As shown in Section I3.2.2( the number of such terms can be 
exponential in the size of the taxonomy. For each term, Qe performs set-theoretic operations on sets of 
objects, which have polynomial time complexity. Thus, though Qe operates in exponential time in the size 
of the taxonomy, it has polynomial time complexity in the size of the information source. 

From a more practical point of view, there is an obvious alternative to Qe for computing ans(t, S), that 
is to solve the decision problem for each object o S Obj. However, this method is not practically applicable 
to peer-to-peer networks, thus we do not take it into consideration any longer. 

3.4 Negation 

In this section we deal with negation. We first consider the addition of negation to the taxonomy of the 
source, then the simpler case in which negation is used in queries only. 
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3.4.1 Adding negation to the taxonomy 



If the queries in taxonomy relationships have negation, then the source corresponds to a datalog program 
with rules that contain negation in their bodies, and it is well known (e.g. see [41]) that such programs may 
not have a unique minimal model. This is illustrated by the source shown in Figure [5l the left part shows 
the source taxonomy, while the right part shows the source interpretation, /, and two minimal models la 
and lb- 



a2 A -^al ■ 
a2 al ■ 



■bl 62 
■ 62 A -61 



query 


/ 


la 


lb 


al 





{1} 





a2 


{1} 


{1} 


{1} 


61 








{1} 


62 


{1} 


{1} 


{1} 


62 A -61 


{1} 


{1} 





a2 A -al 


{1} 





{1} 



Figure 5: A source with no unique minimal model 

The lack of a unique minimal model turns out to be a serious drawback. Let be the language of 
conjunctive queries in which negations of terms may occur, i.e. is given by (as usual, t is a term in T): 



9 
d 
I 



= d \ qy d (g is a query) 
= I \ I A d (d is a disjunct) 
= t I (Z is a literal). 



Moreover, let be the sub-language of £J consisting of just disjuncts. A neg-extended taxonomy is a pair 
(T, ^^), where T is a terminology and (Cy x C^) is reflexive and transitive, such that if qi (j2 and 

9i 7^ 92, then q2 = t for some term t € T. A neg-extended source is a 4-tuple (T, Obj , /), where (T, 
is a neg-extended taxonomy and / is an interpretation for it. 

It can be proved that: 

Proposition 6 Deciding whether an object o € Obj is in the answer of a query q E £^ in a neg-extended 
source 5, o G ans{q, S*), is a coNP-hard problem. 

The proof is based on the following polynomial reduction from SAT. Let a be a CNF formula of propositional 
logic over an alphabet V, that is: 

n mi 
1=1 3 = 1 

where Uj is either a positive literal, that is a letter z; S V, or a negative literal, that is -u where u eV. We 
map a into a neg-extended source Sa = {Ta, ^q, Obj^, la), and a query g„ as follows: 

-Ta = V- 

- Obja = {1}; 

- the query qa is given by 

\J{vi A . . . Avk I -fi V ... V -Wfc is a conjimct ai in a {vi G V)}. 

If there is no conjunct -ui V ... V -Ufe in a, then let ai be /i V ... V Ik; we then set qa — h A . . . A Ik, 
where — u = u and v = — w; 

- for each remaining conjunct ai in a, 

1. if ai is a letter v, then /^(v) = {1}; if for no conjunct a^, ai = v, then Ia{v) = 0; 
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2. if Qfi is Zi V . . . V Ik for fc > 2, where at least one literal is positive, say w.l.o.g. that li is the 
positive literal u, then the subsumption relationship (Z2 A . . . A Zfe, u) is in . 

For instance, the prepositional formula 



is mapped into the source shown in Figure [5] and the query Qq, = al V 61. We now show the following 
Lemma 1 e ans{qa, Sa) iff a is unsatisfiable. 

In fact, we prove the equivalent form: 1 ^ ans{qa, Sa) iff a is satisfiable. 

(^) Suppose a is satisfiable, and let / be a truth assignment over V satisfying it. Let J be the interpretation 
of the taxonomy (Tq,, :<a) such that, for each term < £ V, 



We have that la < J, since for each i G V, either Ia{t) is empty, or Ia{t) = {1}. In the former case, 
Ia{t) C J(t) for any J{t). In the latter case, we have that Uj = t for some 1 < .7 < n, which implies 
f{t) = T (since / satisfies a) which implies J{t) = {1} and again Ia{t) C J{t). Moreover, {q,u) (zdia implies 
Jig) '~= J{u). In proof, {q,u) S^q. iff = -iq V u for some 1 < fc < n, which implies f{~'q V u) ~ T 
(since / satisfies a) and therefore: either f{^q) = T and by construction J{q) = ^, or f{u) = T and by 
construction J{u) = {1}; in both cases J{q) C J[u). Hence J is a model of Sa- However, 1 ^ J{qa)- In fact, 
by construction, for any disjunct d in g^, there exists aj = -id for some 1 < j < n. Since / satisfies a, it 
follows that / satisfies -id so f{d) = F. But then J{d) ~ for each disjunct d in qa, which implies J{qa) ~ 0- 
So, 1 ^ J[q) for a model J, that is 1 ^ ans{qa, Sa)- 

(^) Suppose 1 ^ ans(cia, Sa), and let J be a model of Sa such that 1 ^ J{qa)- Let / be the truth assignment 
over V defined as follows, for each letter t g V, 



By a similar argument to the one developed in the if part of the proof, it can be proved that / satisfies a, 
and this completes the proof of the Lemma. 

From the last Lemma and the NP-completeness of SAT, the coNP-hardness of deciding query re-writing 



We observe that it is essential for the reduction that the query language allows negation. Otherwise, 
prepositional formulae which do not have a conjunct consisting of all negative literals, such as -ivi V ... V -^Vk , 
could not be reduced. 

3.4.2 Adding negation in queries 

In this Section, we consider the evaluation of queries containing negation over a source. To this end, we need 
first to define the extension of a negative literal in an interpretation /. The obvious way of doing so is as 
follows: li-it) = Obj \ I{t). However, as it is well-known, if we maintain our definition of query answer, as 
ans{q, S) ~ {o ^ Obj \ o G J{q) for all models J of 5}, a negative literal in a query is equivalent to the false 
clause, because there is not enough information in the taxonomy of a source to support a negative fact. 

In order to derive an intuitive and, at the same time, logically well-grounded evaluation procedure for 
extended queries, we need an alternative query semantics (i.e. ans). In order to define it, let us consider a 
logical reformulation of the problem in terms of datalog. We map each term ti into two predicate symbols: 

- an extensional one, denoted Ct^, representing the interpretation of i.e. /(tt^); and 



a = a2 A 62 A 



(al V ^a2 V 61) A (al V 61 V ^62) A 
-lal A -16I 





in neg-extended sources follows. 



□ 
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- an intcnsional one. denoted Yt^, representing ti in the rules encoding the subsuniption relation. 



The obvious connection between Ct^ and Yt^ is that all facts expressed via the former are also true of the 
latter, and this is captured by stating a rule (named "extensional" below) of the form Ct^ (x) — > Yt^ (x) for 
each term ti. 

Definition 8 (Source program) Given a source S ~ (T, Obj, I), the source program of S is the set of 
clauses P5 given by Pg = TRs U ERg U Fg, where: 

- TRg = {Yt(x) : — Yti(x), . . . , Yt„(x) | ti A . . . A t„i t} are the terminological rules of Pg; 

- ERg ~ {Ytj(x) : — Ctj(x) | ti G T} are the extensional rules of Pg; 

- Eg = {Ctj(o) I a G I(ti)} are the facts of Pg, stated in terms of constants o which are one-to-one with 
the elements of Obj (unique name assumption). □ 

Next, we translate queries in the language Ct- 

Definition 9 (Query program) Given a query q G Ct to a simple source S = (T, ^, Obj, I), the query 
program of q is the set of clauses Pq given by: 

{q(x) : - Yti(x), . . . ,Yt^(x) | ti A . . . A tfc is a disjunct of q}. 

where q is a new predicate symbol. □ 

In order to show the equivalence of the original model with its datalog translation, we state the following: 

Proposition 7 For each source S = (T, ^, Obj, I), and query q G Ct, ans{q,S) = {o G Obj \ Pg \J Pq \= 
q(o)}. □ 

Let us consider this mapping in light of the new query language C^. The source program Pg remains a 
pure datalog program, while the query program Pg of any query q G C^ against S becomes: 

{q(x) : - Lvi(x), . . . , L„^(x) | Wi A . . . A is a disjimct of q} 

where each Ly. is either Yy^, if Vi ~ ti, or -lY^j, if Vi ~ -iti {ti G T). 

This kind of queries are dealt with by using an approximation of CWA, which can be characterized either 
procedurally, in terms of program stratification, or declaratively, in terms of perfect model. We will adopt 
the former characterization. In fact. Pq is a datalog^ program, and so is the program Pg U Pq. The latter 
program is stratified, by the level mapping / defined as follows: 



l{pred) 



1 if pred is q 
otherwise 



It follows that Pg U Pq has a minimal Her brand model Mg given by ([H]) the least fixpoint of the transfor- 
mation Tp ij^ip where Mp^ is the least Herbrand model of the datalog program Pg, and T'p is the extension 

to datalog^ of the Tp operator, on which the standard semantics of pure datalog is based. The model A/| 
is found from Mpg in one iteration since only instances of q are added at each iteration, and q does not 
occur in the body of any rule. The following definition establishes an alternative notion of answer for queries 
including negation. 

Definition 10 (Extended answer) Given an extended query g to a source 5* — [T, <, Obj, I), the extended 
answer to q in S, denoted e{q, S), is given by: s{q, S) = {o € Obj \ Mg \= q(o)} □ 

We conclude by showing how extended answers can be computed. 
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Proposition 8 For each source S = (T, ^, Obj,I), and query q G e 



{q, S) is given by: 



1. e{q\/d,S) = e{q,S)[Je{d,S), 



2. e{lAd,S)^e{l,S)ne{d,S), 



3. £(t,5) = J(i), 



4. £(-^,5') = Obj\e{t,S). 



□ 



From a practical point of view, computing e(-iti A ... A -it^) requires computing: 



06j\(/(ii)U...uJ(ifc)) 



which in turn requires knowing Obj , i.e. the whole set of objects of the network. As this knowledge may 
not be available, or may be too expensive to obtain, one may want to resort to a query language making a 
restricted usage of negation, for instance by forcing each query disjunct to contain at least one positive term. 

3.5 Disjunctive information sources 

In this section we consider disjunctive sources, whose taxonomies allow subsumption relationships between 
queries. Formally, a disjunctive taxonomy is a pair (T, :<(i) where T is a terminology and {Ct x Ct) 
is reflexive and transitive. A disjunctive source S" is a 4-tuple (T, ^rf, Obj, I) where (T, is a disjunctive 
taxonomy and {Obj, I) is an interpretation for it. 

Disjunctive sources may not have a unique minimal model. As an example, the source {T,^di Obj, I) 
where: 

- T = {al,a2,bl,b2} 

- =<rf= {(a2, al V 61), (52, al V 61)} 

- Obj = {1} and 

- / = {(a2,{l}),(52,{l})} 

has two minimal models, /i = / U {(al, {1})} and I2 = I U {(61, {1})}. 

Loosing the uniqueness of the minimal model is enough to make query evaluation for this kind of sources 
computationally difficult. 

Proposition 9 Deciding whether an object G Obj is in the answer of a query q G Ct in a disjunctive 
source S, o € ans{q, S),\s a. coNP-hard problem. 

The proof is similar to that of Proposition [6] For brevity, we just show the reduction from SAT. Let a be 
as in the proof of Proposition [6l We map a into a disjunctive source Sa = [Ta, ^q, Obj ^, la), and a query 
qa as follows: 



- T =V- 



Ob] 



a. 



{1}; 



the query qa is given by 



\J{vi A ... A Vk\^vi V ... V -iw/j is a conjunct in a {vi G V)} V 
\/{^ui A ... A -lUk I ui V . . . V Ufc is a conjunct in a (ui G V)} 
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If there are no such conjuncts -ivi V ... V -iVk or -lUi A ... A -lUfc in a, then let ai be Zi V ... V Ik] we 
then set = h A . . . A Ik, where ^iu — u and TJ ~ -iv. 

- for each remaining conjunct a.; in a, 

1. if ai is a letter v, then Ia{v) = {1}; if for no conjunct a^, 0;^ = w, then Ia{v) = 0; 

2. if ai is -iiti V ... V V ui V ... V u,„ where j, m > 1 then the subsumption relationship (mi A 
... A Uj,vi V ... V Um) is in . 

In the present case, the propositional formula 

a = a2 A 62 A 

(al V ^a2 V 61) A (al V 61 V ^62) A 
-lal A -16I 

is mapped into the source shown in the previous example. 
It can be shown that 1 G ans{qa, Sa) iff a is unsatisfiable. 

4 Networks of Information Sources 

In this Section we introduce networks of information sources. The model is first outlined, and then query 
evaluation is considered. 

4.1 The model 

In order to be a component of a networked information system, a source is endowed with additional sub- 
sumption relations, called articulations, which relate the source terminology to the terminologies of other 
sources of the same kind. 

Definition 11 (Articulation) Given two terminologies T and U, an articulation from T to U, <tu, is a 
non-empty binary relation from Cjj to T, such that q :<tu t implies that g is a conjunctive query. □ 

An articulation relationship is not syntactically different from a subsumption relationship, except that 
its head may be a term of a different terminology than the one where the terms making up its tail come 
from. 

Definition 12 (Articulated Source) An articulated source S over fc > disjoint terminologies Ti, Tk, 
is a 5-tuple S = (T^, ^5, Obj,Is, Rs), where: 

- (TIs, :<s, Obj, Is) is a source; 

- Rs is a set of articulations Rg ~ {^Ts.Ti7 • ■ • , r^Ts,Tfc}- D 

Articulations are used to connect an articulated source to other articulated sources, so creating a net- 
worked information system. An articulated source iS with an empty stored interpretation, i.e. Isit) = for 
all t Cz Tg, is called a mediator in the literature. 

Definition 13 (Network) A network of articulated sources, or simply a network, A/" is a non-empty set 
of articulated sources J\f = {Si, . . . , iS„}, where each Si is articulated over the terminologies of some of the 
other sources in J\f and all terminologies Tg^ , ■ . ■ , Tg^ of the sources in M are disjoint. □ 
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Notice that the domain of the interpretation of an articulated source is independent from the source, 
thus the same for any articulated source. This is not necessary for our model to work, just reflects a typical 
situation of networked resources such as URLs. Relaxing this constrain would have no impact on the results 
reported in the present study. 

Since in a network: (a) there is no source acting at the global level, (b) all sources store data, and (c) 
as we will see, data are exchanged via direct communication, each source can be seen as, and will in fact be 
called, a peer, and the network as a peer-to-peer information system. Articulations of the network peers will 
also be referred as P2P mappings. 

An intuitive way of interpreting a network is to view it as a single source which is distributed along 
the nodes of a network, each node dealing with a specific vocabulary. The global source can be logically 
constructed by removing the barriers which separate local sources, as if (virtually) collecting all the network 
information in a single repository. The notion of network source captures this interpretation of a network. 

Definition 14 (Network source) The network source SV of a network of articulated sources A/" = {Si, . . . ,iS„}, 
is the source 

Sat = (TV, Obj, %), where: 



where C5. is the total subsumption of the source Si, given by the miion of the subsumption relation :<s, 
with all articulations of the source, that is: 



It is not difficult to see that □ is reflexive and transitive, and every non-trivial subsumption relationship 
in it relates a conjunctive query in anyone of the terminologies T^^, . . . ,Ts^ to a single term. Thus, S_\f 
is indeed a source. Such source emerges in a bottom-up manner from the articulations of the peers. This 
distinguishes peer-to-peer systems from federated distributed databases. 

A network query g is a query in anyone of the query languages supported by the network, that is (7 G Cts. 
for some i S [l,n]. As it will be evident, the method that we will set up only requires minor modifications 
to be able to evaluate also queries in the language Ctj^, that is queries that mix terms from different 
terminologies. We do not provide this facility because it does not seem to make much sense in our vision. 

The answer to a network query q, or network answer, is given by ans{q, Sj\f). 

Figure [5] presents the taxonomy of a network source SV, where JV consists of 3 peers Af ~ {Pa, Pb, Pc}- 
As it can be verified, this is the same taxonomy as the one shown in Figure [Jl except that now some of its 
subsumption relationships are elements of articulations. 



7V-ULi?>.; 



= ^s, U \JRs, 



and A* denotes the transitive closure of the binary relation A. A network query is a query over T^. 



□ 




Figure 6: A network taxonomy 
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4.2 Network query evaluation 



This Section presents a network query evaluation procedure based on the method devised in the centraUzed 
case. First, a functional model of each peer is introduced, then the algorithms corresponding to the operations 
on the interface of the peer are given. Correctness and complexity of these algorithms are discussed in 
Section l473l while Section [5] concludes by considering optimization issues. 

4.2.1 The functional model of a peer 

In order to illustrate our query evaluation procedure, we now define a peer from a functional point of view. 
In this respect, we see a peer as a software component uniquely identified in the network by a peer ID. The 
interface of a peer exposes just one method: 

- Query, which takes as input a network query q and evaluates it, returning the set of objects ans{q, SV)- 

The user (whether human or application program) is supposed to use this method for the evaluation of 
network queries. We assume that q is expressed in the query language of the peer. As it will be argued in 
due course, this assumption can be relaxed without any substantial change to our framework. 

In addition to Query, a peer has methods for sending to or receiving messages from other peers. We do 
not enter into the details of these methods: there arc several options, which do not make any difference from 
the point of view of our model. Instead, we detail the types of messages that can be exchanged between 
peers. These can be of one of the following 2 types: 

- Ask: by sending a message of this kind to a peer P, the present peer asks P to evaluate a term query 
on P's query language. The receiving peer P processes Ask messages according to the Qe procedure 
(Figure lU, as we will see in detail below. An Ask message has the following fields: 

— PID: the id of the present peer, which is sending the message; 

— QID: the id of the query that PID is sending for evaluation; 

— t: the query term of QID; 

— A: the set of already visited terms. These two last parameters are those of the Qe procedure. 

- Tell: by sending a message of this kind to a peer P, the present peer returns to P the result of the 
evaluation of a term query which had previously been AsK-ed by P. A Tell message has the following 
fields: 

— QID: the ID of the query whose result is being returned; 

— RES: the set of objects resulting from the evaluation of QID. 

We will denote the sending of a message of one of these two kinds m to the peer P as P: m(field values). 
By decoupling the request of evaluation from the return of the result, we aim at minimizing the number of 
sessions open at any time between peers, thus removing a serious obstacle towards scalability. Query does 
not follow this paradigm since it involves only a local interaction. 

Each peer processes the incoming messages depending on their type and content. In order to carry out 
this work, the peer keeps a ( query) log, that is a set of objects, each associated to a query in whose evaluation 
the peer is currently involved. A log object has the following attributes: 

- PID: the id of the peer who sent the query (can be the local peer itself); 

- QID: the id of the query; 

- t: the query term (we recall that we need to deal only with term queries); 

- n: the number of open calls in QID (sec next paragraph); 
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- QP: the query program representing the current status of evaluation of QID. A query program is a set 
of sub-programs {SPi, . . . , SPk} where each sub-program SPj is a set of calls. A call is a sub-query of 
QID, and can be: 

— open, meaning that the sub-query is being evaluated, in which case the call is the sub-query id; 
or 

— closed, meaning the sub-query has been evaluated, in which case the call is the resulting set of 
objects. 

Since no two log objects can have the same query id, we will represent a log object as a 5-tuple {PID, QID,t,n, QP). 

4.2.2 Query 

Let us assume that the input query q posed to a peer S, is given by 

q = \/a 

where each d is a conjunctive query. As a first step. Query reduces g to a term query t by generating 
a new term t not in and inserting a new hyperedge {Ci,t) into the local taxonomy B-graph (i.e. that 
corresponding to (TsjC^)), for each conjunctive query Ci in q. This work is carried out by the function 
Modify- TAXONOMY, which returns the newly generated term t. A new query id for t is subsequently 
obtained by Query, and an Ask message is sent to the peer itself for evaluating t. As required by Qe, the 
set of already visited terms consists just of t itself. At this point Query hangs on the log, until the log object 
associated to the query t is closed, that is the number of its open call is 0. Notice that this object is created 
only after the Ask message sent on line 3 is processed, but this creates no problem, as all Query has to do 
in the meantime is wait. When the log object is finally closed. Query retrieves it and deletes it from the log, 
by using the function Delete, which returns the object itself. When the object is closed, its query program, 
that is the value of the last field, equals to ans{t, Sj\/). This value is assigned to the variable R. On line 6, 
the subsumption relationships inserted by Modify- taxonomy are removed by Cleanup-taxonomy, and 
R is finally returned. 

QuERY(q : query); 

1. t ^ Modify- TAxoNOMY(g) 

2. ID ^ New-query-id 

3. self: ASK{self, ID, t, {t}) 

4. wait until ID is closed then 

5. {PID, QID, t, n, R) ^ Delete(/D) 

6. CLEANUP-TAXONOMY(t) 

7. return(_R) 

Figure 7: The Query procedure 

As an example, let us consider the network shown in Figure [SI whose corresponding B-graph is shown 
in Figure [51 and the query (a2 A a3) on peer Pa- When given as input to Query, this query is passed on 
to Modify- TAXONOMY, which adds the hyperedge ({a2,a3},t) to the taxonomy B-graph and returns the 
newly generated term t. Let us assume that ql is the id of the new query. Query then sends the message 
AsK(Pa, ql, t, {t}) to itself, and gets into the wait loop until the query is evaluated. 

4.2.3 Ask 

For readability, we will describe Ask and Tell as if they were methods whose parameters are the message 
fields. Ask (Figure [H]) uses the following variables: 

- n : counts how many sub-queries the input query QID generates; 
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Figure 8: A network taxonomy B-graph 



- QP: is the initial query program of QID; 

- Q : is a queue holding the information to send the Ask messages required to evaluate QID; 

- C : is the query sub-program being currently computed. 

After initialization, Ask performs (line 2) the same test as Qe, looking for a hyperedge h in the local B- 
graph whose head is the given term t and whose tail is disjoint form A. If no such hyperedge is found, then 
n remains 0, the test on line 10 fails, and the result of the evaluation of the given term query t is just I{t) 
(as Qe establishes), which Ask returns by sending a Tell message to the invoking peer PID (line 15). If 
instead a hyperedge h is found, then the intersection of the evaluation of each term Ui in its tail should be 
added to the result, according to Qe. In order to achieve the same behavior. Ask enters a loop in which it 
processes each term Ui to the end of constructing in C the query sub-program associated to h. First, a new 
query id ID is generated (line 5) to denote the sub-query on u.;; the newly generated id is then added to C. 
On line 7, the number of open calls is increased by one, and on line 8 the required information to evaluate 
the query Ui is enqueued in Q. This information is: 

- the id of the peer P^ holding the terms in the tail of the hyperedge h; we assume this information is 
stored with the hyperedge just for convenience, the peer can also store it separately; 

- the ID of the sub-query; 

- the query term m and 

- the set of the visited terms A U {ui}, as in Qe. 

Each sub-program so generated is added to QP, after considering all relevant hyperedges (line 9). At this 
point, if the number of open calls is positive. Ask uses the function Persist in order to create the log 
object representing the query QID, and to persist it in the log. Once the log object is successfully persisted. 
Ask must launch the evaluation of the generated sub-queries, which it does in the loop on lines 12-14. Until 
Q is empty, it dequeues the information for constructing an Ask message for each sub-query, and sends such 
message to the peer P^. The value of the first message field is the peer identity {self), as the invoking peer. 

At this point, it can be easily verified that the assumption that all terms in the tail of a hyperedge 
are from the same terminology, namely that of peer P^, can be relaxed without any impact on the query 
evaluation procedure. In logical terms, this is the assumption that the conjunctive queries on the left-hand 
side of subsumption relationships are from the query language of one peer. We have made this assumption 
because it fits our vision of a network. But Ask can easily work also with hyperedges whose tails have terms 
from different terminologies: all that is required is to store the id of the peer holding each term, rather than 
the id of the peer holding the whole hyperedge. 

Let us resume our running example. Upon processing the message (Pa, ql, t, {t}), Ask finds that the 
hyperedge h = {{a2,a3},t) passes the test on line 2, and enters the loop on the tail of h. For term a2, 
assuming the generated query id is q2, the record {Pa, q2, a2, {t,a2}) is enqueued in Q, while for term a3, 
(generated id g3) it is enqueued the record {Pa, q3, a3, {t, a3}). As there are no more hyperedges and n = 2, 
a new log object is created to represent the query t. The attributes of this object are: 
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Ask{PID,QID: id- t : term; A : set of terms); 



1. n ^ 0; QP,Q ^9 

2. for each hyperedge h = {{ui, Ur}, t) such that {ui, Ur} H A = do 

3. C ^ 

4. for each do 

5. ID ^ New-query-id 

6. C ^ C U {ID} 

7. n ^ n + 1 

8. Enqueue(Q, (Ph, /D, Mi, A U {ui})) 

9. 0P<- QPU {C} 

10. if n > then 

11. 'P¥.KSlST{PID,QID,t,n,QP) 

12. until Q / do 

13. {Ph,ID,u,B) ^ Dequeue(Q) 

14. : ASK(se//, ID, u, B) 

15. else PID:TEhh{QID,I{t)) 



Figure 9: The procedure to process AsK messages 

- PID = Pa 

- QID = ql 

- t = t 

- n = 2 

- QP ^ {{q2,q3}}. 

Now two Ask messages are send to Pa : 

1. {Pa, q2, a2, {t,a2}), and 

2. {Pa, q3, a3, {t,a3}). 

Let us see how the latter message is processed. Since there are no incoming hyperedges into term a3, n 
remains 0, and the processing of the message is concluded by the sending of the message Tell((73, /(a3)) to 

Pa. 

4.2.4 Tell 

When a peer receives a Tell{QID,R) message (sec Figure [TU|) . QID is an open call of some log object in the 
peer's log, in the program of some term (sub)qucry t with id QIDi. Then, as a first action, the peer retrieves 
this object by using the DeleteI function, which takes as input QID, returns the object and deletes it form 
the log. Notice that there is exactly one object having QID as open call, since Ask generates a new id for 
each sub-query it identifies, as we have already seen. After retrieving the log object. Tell uses Close to 
modify the query program QP in it, by closing the open call QID: this means to replace QID by R, obtaining 
a new query program QPi. On line 3, the number of open calls of the log object is tested: if it is 1, then 
the just closed call was the last one to be open in query QIDi, in this case, the result of QIDi is computed 
in S by Compute- answer. For a given program: 

QP = {SPi,...,SP^] 

where each sub-program SPj is given by a collection of object sets: 

SPj = {R-[, . . . , Rl^. } 

Compute- answer returns: 

^=U{n^^^ I i<j<™} 
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S U I{t) is exactly what the Qe procedure computes. If t is not in the terminology of the peer {t ^ Tgeif) 
then it follows that QIDi is the id of the original query q. Thus, I{t) — and S = ans{t, SV)- Therefore, 
the object {PID, QIDi, t,Q,S) is persisted in the log (line 5), indicating to QuERY(g) (Figure [7]) that the 
evaluation of the query q has finished. Otherwise, the so obtained result S U /(<) is TELL-ed to the peer 
PID which, according to the log object, was the one to Ask the evaluation of QIDi. Notice that this may 
fire another Tell message, in case QIDi is the last open call of some other query. If the test on line 3 fails, 
then there are still open calls in the log object, which is therefore persisted back by Persist on line 6, after 
decreasing the number of open calls in it and replacing the query program QP by the updated one QPi. 

TEhh{QID: ID; R : set of objects); 

1. {PID, QIDi, t, n, QP) ^ Delete1{QID) 

2. QPi^ Close{QP, QID, R) 

3. if 71 = 1 then 

4. S <~ Compute- answer((3Pi) 

5. if t ^ Tseif then Persist(P/Z), QIDi, t, 0, 5*) 

6. else PID:TEl.hc{QIDi,t,SVM{t)) 

7. else Persist(P/Z), QIDi, t,n~l, QPi) 

Figure 10: The procedure to process Tell messages 

In our example, the message TELL(g3, I{a3)) is received by peer Pa- The function DeleteI returns the 
log object {Pa, ql, t, 2, {{q2, g3}}), the only one that has the open call g3. Close produces the new query 
program {{q2, /(a3)}}, and since n is not 1, the following modified log object is persisted: 

{Pa, ql, t,l,{{q2,I{a3)}}). 

The example is completed in appendix. 

4.3 Correctness and complexity 

As it has been argued, the combined action of the procedures processing Ask and Tell messages is equivalent 
to the behavior of the procedure Qe. To see why in more detail, it suffices to consider the following facts: 

1. An Ask message is generated for each recursive call performed by Qe and vice- versa, that is whenever 
Qe would perform a recursive call, an Ask message is generated. This is guaranteed by the fact that 
the test on line 2 of Ask is the same as the test on line 3 of Qe. Therefore, the number of Ask messages 
is the same as the number of terms that can be found on a B-path from t. 

2. For each Ask message, at most one log object is generated and persisted. 

3. For each Ask message, a Tell message results, and no more. This can be observed by considering 
that, for each processed Ask message, there can be two cases: 

(a) no hyperedge is found that passes the test on line 2 of Ask: in this case, no subsequent Ask mes- 
sage is generated, and a Tell message is generated; 

(b) at least one hyperedge passes the test: in this case a number of sub-queries is generated and 
registered in the query program of the log object. Each such sub-query is evaluated by issuing 
an Ask message with a larger set of visited terms. Since the B-graph is finite, eventually each 
sub-query will lead to a term falling in the previous case (this is how Qe terminates). When all 
sub-queries of a given term query t are closed, the number of open calls of t goes down to 0, and 
Tell issues another Tell message on t. This will propagate closure up, until all open calls are 
closed. 

4. Finally, the Compute- answer procedure performs the same operation on the result of sub-queries as 
Qe does on the results of its recursive calls. 
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As a consequence of these facts we have the correctness of the network query evaluation procedure, and 
also its efficiency. In fact, the total number of messages generated is twice the number of terms visited by 
Qe, and the number of log objects is no larger than that. 

5 Optimization issues 

So far, we have focused on correctness. In this Section we discuss optimization. There are many techniques 
that are potentially useful to this end. For instance, when sub-queries return large results, their closing 
(performed by Close) and the computation of their results (Compute- answer) should be done with care. 
However, dealing with all the relevant optimization techniques goes beyond the scope of this paper. Instead, 
we focus on caching (Section 15. ip . which is applicable to all situations, and on exploiting data structures 
employed in structured P2P systems, namely Distributed Hash Tables (like in Chord [36]). This latter issue 
is tackled in Sections 15.21 and 15. 3[ besides showing how to further improve the efficiency of the system, the 
ensuing discussion hints at how to extend the applicability of our model, and highlights the relationship with 
a large part of the literature on P2P systems. More on related work can be found in Section [SI 

5.1 Caching 

A strong point of our model is that the adoption of caches could significantly speed up the evaluation of 
queries, by reducing both the latency time and the network throughput. This is because the set of queries 
that a peer can send to its articulated peers is bounded in size and can be pre-determined: it comprises 
all "foreign" queries of the peer, i.e. queries that appear as left-hand sides in the peer's articulations. Note 
that the number of queries that a peer can propagate to its neighbors is unbounded in other models of P2P 
systems, for example in Gnutella, where each peer propagates whatever query it rcceivet|f|. It follows that 
the caches of our model will enjoy higher hit ratios compared to other P2P models, for the same cache size. 
The subsequent subsections present three caching policies, namely: 

- caching answers of local terms, 

- caching answers of local terms and pushing answers of articulation tails, and 

- caching answers of articulation heads. 

5.1.1 Caching ansvirers of local terms 

According to this caching policy, each peer S caches pairs of the form (t, ans{t, SV)), where t is a term in 
the peer's terminology Tg. If there are no memory limitations for caches, then after a while each peer will 
have cached its whole terminology, and query evaluation reduces to locally calculating the extension of the 
query by union-ing and intersecting the extensions of the peer's terms. In other words, any peer will be able 
to evaluate network queries over its own taxonomy without sending any message to the networl0! This is of 
course the idealistic case. In general, only some terms (possibly none) will be cached in each peer. Under 
these circumstances, when a peer S receives an Ask message for a term query t, the Ask procedure checks 
which of the answers for the term (sub)queries needed for the evaluation of t are in the cache, and issues 
Ask messages only for evaluating the remaining terms. 

The modified query evaluation algorithms for supporting this caching policy are parts of the algorithms 
for the more general policy that is described in Section [5. 1.21 

^FreeNet tries to improve the situation by forwarding queries (and new objects too) only to those peers that, according to 
the contents of the cache, have similar keys. In this way, each cache tends to have entries about similar keys and this tends to 
improve the quality of routing over time. 

^ Apart those required for re-evaluating queries when updates occur. 
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ASKc{PID,QID: ID; t : term; A : set of terms); 

1. if t is cached then PID:TEhhc{QID,t, ans{t,Sj^)) 

2. else if \A\ = 2 then add t into to-be-CACHED log // t is a term of the original query q 



3. n ^ 0; QP, Q, S ^ 

4. for each hyperedge h — {{ui, u, },i) such that {ui, ...,Ur} f] A — f/} do 

5. if Ph i= self and ui A . . . A u,- is cached then C ^ {ans(iti A ... A Ur, Sa/')} 

6. else C ^ 

7. for each Ui do 

8. if Ph = se// and iii is cached then 

9. C «- C U {ans{ui, Sj^)} 

10. else 

11. ID <— New-query-id 

12. C C U {ID} 

13. n <— n -|- 1 

14. Enqueue(Q, (Ph, ID, Ui, A U {ui})) 

15. QP^ QPU {C} 

16. if n > then 

17. Persist(P//), QID,t, n,QP) 

18. until Q / 

19. {Ph,ID,u,B) «- Dequeue(Q) 

20. Ph:ASKc{self,ID,u,B) 

21. else if QP/ then S' <— Compute- answer((3P) 

22. PID:TEhh4QID,t,SLII{t)) 



Figure 11: The procedure to process Ask messages with cache 

5.1.2 Caching answers of local terms and pushing answers of articulation tails 

A complementary scenario, best suited for a P2P system that offers recommendation services in push-style 
manner, is to assume that each peer S knows also the articulations ti A . . . A ^ m from other peers S' to S 
(called foreign articulations). In this case, if all the terms ti, . . . ,tr are cached in S, then S can send to S' 
the pair (ii A . . . A t^, ans{ti A ... A i^, Sj\f)) to be stored in the cache of S' . This can be done because from 
Proposition [3] and Definition 2] it follows that 

ans[ti A . . . Atr, Sj^) = ^{ans{ti, Sj^) \ I < i < r} 

The cache is exploited by the modified Ask procedure (AsKc), shown in Figure [TTl The modified with 
caching Tell procedure (TelLc) is shown in Figure fT^ The modifications are indicated by bold line numbers 
and are described in a semi-formal way, in order to abstract from irrelevant details. 

The cache of a peer S consists of two kinds of pairs: 

- (i', ans(t',SV)) where t' is a term in the peer's terminology Tg- Pairs of this kind are inserted into 
the cache by the TelLc(Q/D, t' , R) procedure^, when the peer S is TELL-ed the answer R for a term 
query initiated by an Ask message of type 

Ask{PID, QID, t', {u,t'}) 

where it is a new term created by Query ((7) to represent the original (complex) query q, posed to peer 
S. This means that the term t' appears in q and is not evaluated in the context of the evaluation 
of a more general term. For example, this is the case of the Ask messages presented at the end of 
Section 11231 

1. (Pa, g2, a2, {t,a2}), and 

®Note that TEhhc(Q I D , t' , R) takes an extra argument t' , which is the term query corresponding to query id QID. 
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TEhhc{QID: ID; t' : term; R : set of objects); 



1. it t' in TO-BE-CACHED log then // t' is a term of the original query q 

2. delete t' from to-be-cached log 

3. CACHE(t',i?) 

4. for each foreign articulation ti A . . . A ^ u from another peer S to self do 

5. if t' € {ti,...,tr} and all ti, tr are cached then 

6. forward to S the pair (ti A . . . A tr, ans{ti A . . . Atr, Sa/) for caching 

7. {PID, QIDi, t, n, QP) ^ Delete1((3/£)) 

8. QPi^ Close(QP, QID, R) 

9. if n = 1 then 

10. S ^ Compute-answer((3Pi) 

11. if t ^ Tseif then Persist(P/A Q/Di, t, 0, S) 

12. else P7D:TelLc(Q/I>i, t,S\J I{t)) 

13. else Persist(P/A QIDi, t, n - 1, QPi) 



Figure 12: The procedure to process Tell messages with cache 
2. {Pa, q3, a3, {t,a3}). 

In this way, based on the correctness of the query procedure (Section 14. 3p . it is guaranteed that 
R = ans{t' , SV), i-e. the received answer R is the full answer for t' and not a subset of it, reduced due 
to cycles in the taxonomy (Tv", ^aa)- Thus, the pair {t' , R) can be safely cached. 

- {ti A ... Atr, ans{ti A ... Atr, 5V)) where ti A ... A tr ^ u is slu articulation from S to S' , i.e. u £ Tg 
and ti, . . . ,tr € Ts' . Each such pair is forwarded to S by the TelLc procedure executed at the peer 
S' , upon realizing that all the terms involved in the left-hand side of the articulation are stored in the 
local (to S') cache. In particular, this check is made immediately after a pair {t' , ans{t' , SV)) is added 
in the cache of S', where t' g {ti, ...,tr} (see lines 3-6 of TelLc). 

Below are the main differences of AsKc{PID, QID,t,A) with respect to the cache-less Ask: 

- If the answer to the term query t AsK-ed by peer PID is in the cache, then the answer is immediately 
TELL-ed to peer PID. Otherwise, if \A\ = 2 then t is added in the to-be-CACHED log [t is a term of 
the original query q). The TO-BE-CACHED log is checked by T¥.Lhc{QID ,t' , R). If t' is found in the 
to-be-cached log then {f ,R) is added to the local cache through the CACHE(t',i?) command (line 3 

of TELLc). 

- Before processing the tail of a hyperedge h which passes the test on line 4, a test is performed, to 
ascertain whether the query corresponding to the tail, given by ui A ... A Ur, is in the cache (this test 
is needed only if Ph ^ self , i.e. h corresponds to an articulation hyperedge). If yes. the only action 
taken is the insertion of ans{ui A ... A Ur, SV) into the query sub-program QP being built (line 15). 
If the query is not in the cache, then for each Ui, it is checked if its answer is in the cache (this test is 
needed only if Ph = self). If not. then the execution proceeds normally. 

- If all sub-queries are cached, then when all relevant hyperedges have been processed (line 16), n is zero 
but QP is not empty. In this case the test on line 21 is passed, and the result of QID is computed 
in S as if closing QP in a Tell. S is subsequently returned along with I{t). If QP is empty, then no 
hyperedge has been found and S = %. So, the result returned to the user is simply I{t). 

We would like to note that our algorithms can further be extended such that TelLc caches the answer 
S Li I{t) for term sub-queries t before TELL-ing them to the requesting peer PID (line 12 of TelLc), as long 
as it is certain that S U I{t) = ans{t, S^f). This is the case if (i) for each term w of a peer S' encountered 
during the evaluation of t (including t itself), all hyperedges {{ui, Ur}, u) of the taxonomy B-graph of S' 
pass the test of line 4 of AsKc, or (ii) u is cached. Thus, (i) no evaluation path of u is eliminated due to 
cycles in the taxonomy (IV, :^j\f) or (ii) ans{u, Sj^f) is immediately retrieved from the cache. 
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ASKf'\PID,QID: lD,t : term; A : set of terms); 

1. if t is cached then P//):Tell^^'((3/Z), i, ans{t,Sj~/), full) 

2. else if \A\ = 2 then add t into to-be-CACHED log // t is a term of the original query q 



3. n ^ 0; QP, Q, 5* ^ 0; flag =full 

4. for each hyperedge h = {{ui, ...,Ur},t) do 

5. if {ui, ...,itr} n yl = then 

6. if Ph 7^ self and ui A . . . A ?ir is cached then C ^ {ans{ui A ... A Ur, 5a/')} 

7. else C ^ 

8. for each Ui do 

9. if Ph — self and Ui is cached then 

10. C ^ C U {ans{ui,SM-)} 

11. else 

12. ID ^ New-query-id 

13. C C U {ID} 

14. n n + 1 

15. Enqueue(Q, {Ph, ID, Ui, A U {ui})) 

16. QP«- QPU {C} 

17. else flag ^partial 

18. if n > then 

19. Persist(P/Z), QID, t, n, QP, flag) 

20. until Q 7^ 

21. {Ph,ID,u,B) ^ Dequeue(Q) 

22. Ph:ASKr'(se(/,/D,u,B) 

23. else if QP/ then 5 Compute- answer((3P) 

24. P/D:TELLr*((3/A 5 U I{t), flag) 



Figure 13: The extended procedure to process Ask messages with cache 

For this reason PERSiST{PID,QID,t,n,QP) and TELLc{QID,t' , R) should be extended with an extra 
field flag that takes the values full or partial. A (query) log object {PID,QID,t,n, QP,flag), where 
flag =f ull, of a peer S indicates that (i) for all closed term sub-queries of QP, full answers have been 
received and (ii) all hyperedges ({ui, ...,Ur},t) of the taxonomy B-graph of S have passed the test of line 
4 of AsKc. If this is not the case, flag =partial. A message TELLc{QID,t' , R, flag), where flag =full, 
indicates that R = ans{t' , Sj^), whereas a message TelLc((5/D, t' , R, flag), where flag =partial, indicates 
that R C ans[t' , Sj\f). Thus, based on the flag information, the TelLc procedure executed at a peer will 
always be able to know if the computed answer S U I{t) for a term sub-query t requested by peer PID is 
a full or partial answer. In the case of a full answer and if t is the head of an articulation hyperedge then 
{t, SUl{t)) is cached. We want to note that the latter condition is not a strong condition and is needed only 
in order to reduce the cache size, while taking the most advantage of caching. 

The extended AsKc procedure (ASK^^*) and the extended TelLc procedure (Tell^^*) are given in Figures 
[T3landfT4l respectively. The modifications are indicated by bold line numbers. Note that Tell^^* calls the 
procedure Cache&Forward (Figure [l3|) . when a pair {t, ans{t, Sj^)) is going to be stored in the cache. 
Additionally, Tell^^* uses the function min{f lag, flag') (lines 8, 11), which returns the minimum of the 
flag values flag, flag', based on the ordering partial < full. This guarantees that the flag value of the 
Tell^^* message in line 8 and the log object in line 11 is correct. 

5.1.3 Caching answers of articulation heads 

The previous algorithms will cache the most frequently used terms, taking full advantage of caching with no 
extra cost for computing cached answers. However, caches may get filled very quickly. Below we investigate 
the case that we cache only the heads of articulation hyperedges, as the cached answer of these terms is the 
most beneficial for speeding-up query answering. For instance, in the example of Figure [51 we want to caclie 
only a2 on Peer Pa, bi and 62 on Peer Pf,, and C2 on Peer Pc- 

For this alternative caching case, a top algorithm can be easily designed such that whenever a peer 
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Tell^^*(Q/Z): ID; t' : term; 7? : set of objects; flag' : {full, partial}); 



1. if t' in TO-BE-CACHED log then // t' is a term of the original query q 

2. CACHE&FORWARD(t',i?) 

3. (PID, QIDi, t, n, QP, flag) ^ Delete1(Q/D) 

4. QPi^ Close(QP, QID, R) 

5. if n = 1 then 

6. S *— Compute-answer(QPi) 

7. it Tseif then Persist(P/D, QIDi, t, 0, S, full) 

8. else PID:TEhhT'{QIDi, t, S U I{t), min{flag, flag')) 

9. if min{flag, flag')=full and t is the head of an articulation hyperedge then 

10. CACHE&FORWARD(t, 5" U /(t)) 

11. else Persist(P/Z), QIDi, t, n -1, QPi, min{ flag, flag')) 



Figure 14: The extended procedure to process Tell messages with cache 



CACHE&FORWARD(f : term; R : set of objects); 
//It stores the pair {t, R) in the local cache and checks if related (foreign articulation) 
query-answer pairs can be forwarded to other peers for caching 



1. CACHE(t,i?) 

2. if t in TO-BE-CACHED log then delete t from to-be-CACHED 

3. for each foreign articulation ti A . . . A tr ^ u from another peer S to self do 

4. if t G {ti,...,tr} and all ti, tr axe cached then 

5. forward to S the pair (ti A . . . A tr, ans{t\ A . . . Atr, S^f) for caching 



Figure 15: The procedure Cache&Forward 



ASKf^ {PID, QID: ID; t : term; A : set of terms); 

1. if t is cached then PID:TEhh{QID,t, ans{t,SAr)) 

2. else n ^ 0; QP, Q, S ^ 9 



3. for each hyperedge h = {{ui, ...,Ur},t) such that {ui , ...,Ur} C] A — <l) do 

4. C ^ 

5. for each Ui do 

6. if Ui is cached then 

7. C ^ C U {ans{ui. Sat)} 

8. else 

9. ID ^ New-query-id 

10. C ^ C U {ID} 

11. n ^ n+1 

12. Enqueue((5, {Ph, id, u,, a U {ui})) 

13. QP*- QPU {C} 

14. if n > then 

15. Persist(P/D, Q/Z),t,n,QP) 

16. until Q / 

17. {Ph,ID,u,B) ^ Dequeue(Q) 

18. Ph-AsKf\self,ID,u,B) 

19. else if QP/ then 5 ^ Compute- answer((3P) 

20. P/D:TELL(0/D,S'U/(t)) 



Figure 16: An alternative procedure to process AsK messages with cache 
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receives an external query q, it finds the loeal terms that are heads of articulation hyperedges and are 
needed for the evaluation of the query. Then, for each such term i, if t is not cached, it calls the Query (i) 
procedure (Figure [7]) and it caches t along with the received answer i?, as it is certain that R = ans{t, SV)- 
This will fill the needed caches. The answer of the original query is then computed locally (e.g.hy a version 
of the Qe procedure, modified with caching). Note that QuERY(t), in this case, should call Ask"'* (Figure 
[T6|l which is a simplified version of AsKc that issues Ask"'* and Tell messages. Though this approach has 
the extra cost of requiring full answers for terms that do not belong to the original query q, it is the most 
beneficial with respect to the trade-off cache size versus speed. 

Of course, another alternative is if the above mentioned top algorithm asks for the answers of foreign 
terms t (through QuERY(t)) that appear in the body of articulation hyperedges, instead of asking for the 
answers of (local) terms t that are heads of articulation hyperedges. 

5.1.4 Synopsis 

Above we described three caching policies. Overall, four query evaluation modes can be supported by our 
model. The three caching policies result in faster query evaluation, but possibly not very updated results, 
since taxonomies, interpretations and articulations change. The mode without cache results in fresher results 
but with a slower query evaluation. 

In case there are memory limitations for caches, various update policies could be employed, e.g. keep in 
cache only the answers of the most frequently used terms, or keep in cache only some parts of the answers, for 
instance "popular" objects according to some external information collected for this purpose (object-ranking 
techniques similar to page-ranking techniques for the Web could be employed to this end) . 

5.2 Querying for object descriptions 

The query language of our model is term-centered, in the sense that users can extract information from a 
source only by asking (Boolean combinations of) terms. But sometimes it would be useful for the user to 
better understand the contents of an object, or the meaning or usage of terms. In these cases, a user would 
like to be able to ask "what are the terms that are used for describing this object?" This question can be 
modulated in different ways, depending whether or not only local terms are desired, and whether or not 
only most specific terms are desired. Correspondingly, an enhanced query language would offer 4 types of 
queries, for a given object o : 

- the most specific, local terms describing o; assuming the local source is S" = (T, ^, Obj, /), the semantics 
of this query would be inds{o); 

- the local terms describing o, that is {t G T | o G ans{t, S*)}; 

- the most specific terms describing o in the network; assuming M is the network, this query would 
return {}{indsi{o) \ Si G A/"}; 

- the terms describing o in the network, that is lj{t G \ o G ans{t, SV)}- 

The last two queries clearly make sense only if the objects are shared amongst the peers, otherwise their 
results would be the same as that of the previous two, respectively. 

Assuming the peers are willing to share their interpretation, an efficient way of answering queries of these 
kinds would be to "invert the network" , that is to assign each object o to one peeiQ. The designated peer can 
store all terms that have been assigned to o by any peer of the network, i.e. indsj^{o). Interestingly, much 
work on P2P systems has focused on the design of data structures for solving this kind of problems (see 
Section [S]). The existence of a Distributed Hash Table (DHT) as an additional data structure (considering 
Obj as the set of keys) would allow checking whether t G indsj^{o) for any t and o very efficiently, by 
exchanging 0{logK) messages where K n. 

as opposed to assign each term t to one peer. 
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5.3 Supporting tacit name-based articulations 



In a complementary way to the network inversion discussed in the previous subsection, suppose that each 
element of Ta/ has a unique global identity and meaning, i.e. if the taxonomies of two peers Si and ^2 
contain two terms having the same name, say traini and train2; then these two correspond to the same 
"concept" train. Making the above assumption means that Tj\f exists before the formation of the network 
and that comprises elements that have the same meaning for all sources that will form the networl{^, 
6.17. Tj\f could be the set of all Greek words, or all terms of the CACM taxonomy. Note that structured 
P2P systems (like Chord and CAN) are based on this assumption (i.e. that there is a globally accepted set 
of keys). In contrast, our model considers that if the same term (e.g. word) appears in the taxonomies of 
two different peers, then these occurrences do not denote the same concept; for example traini could mean 
"wagon train" , while train2 could mean "instruct" . So in our model all agreements should be represented 
explicitly in articulations. 

However, wc could extend our model so that to be able to also capture a preexisting globally accepted 
terminology Ti^/, as follows: If a term t appears in two peers Si and 52, then we could assume that iSi has in 
its articulation the relationship ^2 ^ii a-nd that ^2 has in its articulation the relationship ti ^ t2. Note that 
this would result in symmetric articulations, i.e. it is like assuming that we have one two-way articulation 
ti ~ t2 (that is known by both Si and 52). Although we could capture in this way the existence of a globally 
accepted terminology T_\f, in practice the definition of articulations would be problematic: how could a peer 
discover that another peer uses the same term? 

This problem could be solved by employing a DHT that stores the terms and the addresses of the peers 
that use these terms. Specifically, for each term t in Tv there will be one peer that stores the addresses of all 
peers that have t in their taxonomies. It follows, that a peer can exploit the DHT in order to get efficiently 
the implicit (term-to-term) articulations of its terms (without having to discover by itself the online peers 
that happen to use terms that it uses too). 

Specifically, if t is a term of a peer P and t is involved in the query evaluation procedure (that takes 
place in P), then P should ask the DHT in order to get that addresses of the peers that also use t. It follows 
that the calls to the DHT should be issued in the context of the Ask procedure, so as the resulting terms 
to be taken into account as articulation hyperedges. For example, if {1, 3, 5} is the set of addresses returned 
by the DHT, then the peer behaves as if its articulation contained the relationships ti r< t, is ^ t, ^ t. 

Also note that a special prefix could be used for discriminating global terms from non global terms, 
e.g. global : train. This could be extended to support several name spaces (e.g. transportation : train, 
education : train). 

Overall, we can exploit a DHT of this kind in order to support efficient query evaluation in cases where 
both implicitly defined articulations (e.g. name-based) and explicitly defined articulations (like those dis- 
cussed in this paper) are desired. 

6 Related work 

In this paper we studied the problem of evaluating content-based retrieval queries in an entirely pure P2P 
architecture (without any form of structuring), where each peer can have its own conceptual model expressed 
as a taxonomy. 

To evaluate a query q posed to a peer 5, peer S propagates the incoming query (which is always expressed 
over its own taxonomy) only to those peers to which S has an articulation and who can contribute to the 
answer of the query (the latter is determined by the taxonomy and the articulations of S) . Specifically, S 
does not propagate the original query but a set of queries each one expressed in the query language (here 
vocabulary) of the recipient peer. Note that there is not any form of centralized index (like in Napster [3]), 
nor any flooding of queries (like in Gnutella [T]), nor any form of partitioned global index (like in Chord [36j 

* In other words, it is assumed that there is already a set of agreements between all peers on a common vocabulary. These 
agreements are not represented explicitly within the network (they are external). 
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and CAN |33|V Instead we have a query propagation mechanism that is query and articulation dependent 
(note that Semantic Overlay Networks [13] is a very simplistic approach to this). In case the objects of the 
domain happen to have a unique global identity (like URI), then automatic techniques can be applied for 
the construction of articulations (e.g. see |38|). and we can also obtain more rich object descriptions by 
aggregating the descriptions that have been associated to each object. 

Moreover note that the peers of our model are quite autonomous in the sense that they do not have 
to share or publish their stored objects, taxonomies or mappings with the rest of the peers (neither to one 
central server, nor to the on-line peers). To participate in the network, a peer just has to answer the incoming 
queries by using its local base, and to propagate queries to those peers that according to its "knowledge" 
(i.e. taxonomy + articulations) may contribute to the evaluation of the query. However both of the above 
tasks are optional and at the "will" of the peer. 

The literature about information integration distinguishes two main approaches: the local- as-view (LAV) 
and the global- as-view (GAV) approach (see [10l[28| for a comparison). In the LAV approach the contents 
of the sources are defined as views over the mediator's schema, while in the GAV approach the mediator's 
virtual contents are defined as views of the contents of the sources. The former approach offers flexibility 
in representing the contents of the sources, but query answering is "hard" because this requires answering 
queries using views ([HI [SSI US)- On the other hand, the GAV approach offers easy query answering 
(expansion of queries until getting to source relations), but the addition/deletion of a source implies updating 
the mediator view, i.e. the definition of the mediator relations. In our case, and if the articulations contain 
relationships between single terms, then we have the benefits of both GAV and LAV approaches, i.e. (a) the 
query processing simplicity of the GAV approach, as query processing basically reduces to unfolding the query 
using the definitions specified in the mapping, so as to translate the query in terms of accesses (i.e. queries) 
to the sources, and (b) the modeling scalability of the LAV approach, i.e. the addition of a new underlying 
source does not require changing the previous mappings. On the other hand, term-to-query articulations 
resemble the GAV approach. In a P2P setting, the cycles create more complex emergent relationships. For 
example suppose a peer A having an articulation &i A &2 < ai to a peer B (this is a GAV definition for ai 
of A) and a peer B having an articulation ai A 02 < 63 to the peer A (this is a GAV definition for 63 of B). 
However by taking into account the entire network, we result in the "mixed" relationship foi A 62 A 02 < 63. 

Recently, there have been several works on P2P systems endowed with logic-based models of the peers' 
information bases and of the mappings relating them (called P2P mappings) . These works can be classified 
in 2 broad categories: (1) those assuming propositional or Horn clauses as representation language or as a 
computational framework, and (2) those based on more powerful formalisms. With respect to the former 
category (e.g., see [6]), our work makes an important contribution, by providing a much simpler algorithm for 
performing query answering than those based on resolution. Indeed, we do rely on the theory of propositional 
Horn clauses, but only for proving the correctness of our algorithm. For implementing query evaluation, we 
devise an algorithm that avoids the (unnecessary) algorithmic complications that plague the methods based 
on resolution. As an example, after appropriate transformations our framework can be seen as a special 
case of that in [6|. Then, query evaluation can be performed by first computing the prime implicates of the 
negation of each term in the query, using the resolution-based algorithms presented in [6] . As the complexity 
of this problem is exponential w.r.t the size of the taxonomy and polynomial w.r.t. the size of Obj, there 
is no computational gain in using this approach. Instead, there is an algorithmic loss, since the method is 
much more complicated than ours. 

As for the second category above, works in this area have focused on providing highly expressive knowledge 
representation languages in order to capture the widest range of applications. Notably, [llj proposes a 
model allowing, among other things, for existential quantification both in the bodies and in the heads of the 
mapping rules. Inevitably, such languages pose computational problems: deciding membership of a tuple 
in the answer of a query is undecidable in the framework proposed by while disjunction in the rules' 
heads makes the same problem coNP-hard already for datalog with unary predicate (i.e. terms), as we have 
proved in Section [3.51 These problems are circumvented in both approaches by changing the semantics of a 
P2P network, in particular by adopting an epistemic reading of mappings. 

Below, we review in more detail several works dealing with the problem of answering (union of) conjunc- 
tive queries posed to a peer in logic-based P2P frameworks. 
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In [S], a query answering algorithm for simple P2P systems is presented where each peer S is associated 
with a local database, an (exported) peer schema, and a set of local mapping rules from the schema of 
the local database to the peer schema. P2P mapping rules are of the form cqi ~~> cq2-, where cqi,cq2 are 
conjunctive queries of the same arity n > 1 (possibly involving existential variables), expressed over the union 
of the schemas of the peers, and over the schema of a single peer, respcctiveljl^. Note that this representation 
framework partially subsumes our network source framework, since in our case cqi , eg? are of arity 1, cqi is a 
conjunctive query of the form ui{x) A ... /\Ur{x) over the terminology of a single peeiHj and q2 is a single atom 
query t{x) over the terminology of the peer that the mapping (articulation) belongs to. However, simple 
P2P systems cannot express the local to a peer S taxonomy ^5 of our framework. Query answering in 
simple P2P systems according to the first-order logic (FOL) semantics is in general undecidable. Therefore, 
the authors adopt a new semantics based on epistemic logic in order to get decidability for query answering. 
Notably, the FOL semantics and epistemic logic semantics for our framework coincide. In particular, in [S], 
a centralized bottom-up algorithm is presented which essentially constructs a finite database RDB which 
constitutes a "representative" of all the epistemic models of the P2P system. The answers to a conjunctive 
query q are the answers of q w.r.t. RDB. However, though this algorithm has polynomial time complexity, 
it is centralized and it suffers from the drawbacks of bottom- up computation that does not take into account 
the structure of the query. 

The work in [5| is extended in where a more general framework for P2P systems is considered, which 
fully subsumes our framework and whose semantics is based on epistemic logic. In particular, in [llj . a 
peer is also associated with a set of (function-free) FOL formulas over the schema of the peer. A top-down 
distributed query answering algorithm is presented which is based on synchronous messaging. Essentially, 
the algorithm returns to the peer where the original query is posed, a datalog program by transferring the 
full extensions of the relevant to the query, peer source predicates along the paths of peers involved in query 
processing. The returned datalog program is used for providing the answers to the query. Obviously, our 
algorithm has computational advantages w.r.t. the algorithm in |11| . since during query evaluation only the 
full or partial answer to a term (sub)query is transfered to the peer that posed the (sub)query, and not the 
full extensions of all terms involved in its evaluation. 

The framework in [34j . extends our framework by considering (i) n-ary (instead of unary) predicates 
[i.e. P2P mappings are general datalog rules) and (u) a set of domain relations (also suggested in [35]), 
mapping the objects of one peer to the objects of another peer. A distributed query answering algorithm 
is presented based on synchronous messaging. However, the algorithm will perform poorly in our restricted 
frameworlJ"]. since when a peer receives a (sub)query, it iterates through the relevant P2P mappings and 
for each one of them, sends a (sub)query to the appropriate peer (waiting for its answer), until fixpoint is 
reached. In our case, when a peer receives a (sub)query, each relevant P2P mapping is considered just once 
and no iteration until fixpoint is required. 

A P2P framework similar to |9| is presented in [25], where query answering according to FOL semantics 
is investigated. Since in general, query answering is undecidable, the authors present a centralized algorithm 
(employed in the Piazza system [23]), which however is complete (the algorithm is always sound), only for 
the case that polynomial time complexity in query answering can be achieved. This includes the condition 
that inclusion P2P mappings are acyclic. However, such a condition severely restricts the modularity of 
the system. Note that our algorithm is sound and complete even in the case that there are cycles in the 
term dependency path and it always terminates. Thus, our framework allows placing articulations between 
peers without further checks. This is quite important, because the actual interconnections are not under the 
control of any actor in the system. 

In |20|, 119] . the authors consider a framework where each peer is associated with a relational database, 
and P2P mapping rules contain conjunctive queries in both the head and the body of the rule (possibly with 
existential variables), each expressed over the alphabet of a single peer. Again the semantics of the system 
is defined based on epistemic logic [18| . In these papers, a peer database update algorithm is provided 
allowing for subsequent peer queries to be answered locally without fetching data from other nodes at query 

^Notc that P2P mapping rules of this kind can accommodate both GAV and LAV-stylc mappings, and are referred in the 
literature as GLAV mappings. 

^'^Recall that this restriction can be easily relaxed. 

^^In our framework, domain relations correspond to the identity relation. 
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time. The algorithm (wliicli is based on asynchronous messaging) starts at the peer which sends queries 
to aU neighbour peers according to the involved mapping rules. When a peer receives a query, the query 
is processed locally by the peer itself using its own data. This first answer is immediately replied back to 
the node which issued the query and sub-queries are propagated similarly to all neighbour peers. When 
a peer receives an answer, (i) it stores the answer locally, (ii) it materializes the view represented in the 
head on the involved mapping rule, and (ii) it propagates the result to the peer that issued the (sub)query. 
Answer propagation stops when no new answer tuples are coming to the peer through any dependency path, 
that is until fixpoint is reached. In our case, the database update problem for a peer S amounts to invoking 
S' : Query((7) for each articulation q ^ t from S to another peer S' and storing the answer locally to S. Note 
that our query answering algorithm is also based on asynchronous messaging. However, since it considers a 
limited framework, it is much simpler and no computation until fixpoint is required. In particular, for each 
term (sub)query issued to a peer through Ask, only one answer is returned through Tell. 



7 Conclusions 

This study presents a model of a P2P network consisting of sources based on taxonomies. A taxonomy states 
subsumption relationships between negation-free DNF formulas on terms and negation-free conjunctions of 
terms. The language for querying such sources offers Boolean combinations of terms, in which negation 
can be efficiently handled by adopting a closed- world reading of the information. An efficient, hypergraph- 
based query evaluation method is presented for such sources, resting on results coming from the theory of 
propositional clauses. It is also shown that extending the expressive power of the taxonomy language by 
adding negation or full disjunction, leads to the intractability of the decision problem. 

A model of a P2P network, having sources as nodes, is subsequently presented. The essential feature 
of the model is the possibility of relating the assumed disjoint peer terminologies by means of subsumption 
relationships of the same type as those in the taxonomies of the sources. The resulting system subscribes 
to the universally accepted notion of P2P information system, recently postulated also in the context of the 
so-called emergent semantics [4] . It is also shown that the results presented in the paper do apply also if the 
subsumption relationships are formed by arbitrarily mixing terms from different terminologies. 

An efficient query evaluation procedure for queries stated against such a network is presented, and proved 
correct. The procedure is a distributed version of the centralized procedure, based on an asynchronous, 
message-based interaction amongst the peers aimed at favoring scalability. Some optimization techniques 
are also discussed, namely one based on caching, for which the algorithms for message processing are given. 

Finally, the work is related to the most relevant papers in the area of P2P systems. It remains to be 
seen, whether the same efficiency can be obtained by allowing full datalog as a representation language for 
information sources and for articulations. Yet, it is evident that the B-graph based algorithm presented in 
this paper does not extend immediately to the general datalog case, due to the presence of multiple variables 
in the rules and unification. 
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Completion of the example 

resume the example from the processing of the message Pa:TELL(g3, /(a3)). 

- Pa:TELL(g3,/(a3)) 

Tell finds the object in the log and updates it. The old log on Pa was: 

Pa log 

{Pa,ql,t,2,{{q2,q3}}) 
The new log is: 

Pa log 

{Pa,ql,t,l,{{q2,Iia3)}}) 

- Pa:ASK{Pa,q2,a2,{t,a2}) 

Since there are two incoming hyperedges in a2, both in Pb, Ask enqueues 3 Ask messages to Pi,, one for each 
involved term: 

- Pt:ASK{Pa,q4:,b3,{t,a2,b3}) 

- Pb:ASK{Pa,q5,bl,{t,a2,bl}) 

- Pb:ASK{Pa,q6,b2,{t,a2,b2}) 

It then persists the corresponding log object. The new log is: 

Pa log 

(Pa,ql,t,l,{{q2,I{a3)}}) 
{Pa,q2,a2,3,{{q4},{q5,q6}}) 
and issues the 3 enqueued messages. 

- Pb:ASK{Pa,q'i,b3, {t,a2,b3}) 

Since there are no incoming hyperedges in 63, the message Pa:TELL(g4, 1{b3)) is produced. 

- P,:TELL(g4,/(fe3)) 

Tell finds the object in the log and updates it. The updated log is: 

Pa 

(Pa,ql,i,l,{{g2,/(a3)}}) 
(P,,<?2,a2,2,{{7(63)},{g5,g6}}) 

- Pb:ASK{Pa,q5,bl,{t,a2,bl}) 

Since there are two incoming hyperedges in bl, Ask enqueues 2 Ask messages to Pc, one for each involved 
term: 

- Pc:ASK(Pi,,g7,cl,{f,a2,61,cl}) 

- Pc:ASK(Pi,,g8,c2, {t,a2,61,c2}) 

It then persists the corresponding log object. The log is now: 

Pb log 

(P„,q5,fel,2,{{g7},{g8}}) 

- Pb:ASK{Pa,q6,b2, {t,a2,b2}) 

Since there is one incoming hyperedge in 62, Ask enqueues 2 Ask messages to Pc, one for each involved term: 

- Pc:ASK(Pi,,g9,c2,{t,a2,62,c2}) 

- Pc:ASK(Pb,glO,c3, {f,a2,62,c3}) 

It then persists the corresponding log object. The log is now: 

Pb log 

(P„,q5,61,2,{{q7},{g8}}) 
(P.,q6,62,2,{{g9,<?10}}) 

- Pc:ASK(P6,g7,cl,{t,a2,61,cl}) 

Since there are no incoming hyperedges in cl. Ask generates Pi,:TELL(g7, /(cl)). 

- P6:TELL(g7,/(cl)) 

Tell finds the object in the log and updates it. The new log is: 

Pb log 

(P„,q5,61,l,{{/(cl)},{g8}}) 
{Pa,q6,b2,2,{{q9,qW}}) 
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- P^:ASK{Pt,q8,c2,{t,a2,bl,c2}) 

Since there is one incoming hyperedge in c2 but its tail has a non-empty intersection with the set of visited 
terms, just a Tell message results: Pi,:TELL(g8, /(c2)). 

- P6:TELL(g8,/(c2)) 

Tell finds the object in the log and updates it. The new log is: 

Pb log 

(P„,q5,fel,0,{{7(cl)},{J(c2)}}) 
(P.,q6,62,2,{{g9,(?10}}) 

There are no more open calls in the updated log object, therefore the answer to the query q5 can be computed 
as /(cl) U /(c2). Then the object is deleted permanently from the log and the message Pa:TELL(g5, J(fel) U 
7(cl) U /(c2)) is issued. 

- P<,:TELL(g5,/(&l) U7(cl) U7(c2)) 

Tell finds the object in the log and updates it. The new log is: 

Pa log 

(Pa,ql,t,l,{{g2,/(a3)}}) 

{Pa,q2, a2, 1, {{7(63)}, {7(61) U 7(cl) U 7(c2), g6}}) 

- Pc:ASK{Pt,q9,c2,{t,a2,b2,c2}) 

Since there is one incoming hyperedge in c2. Ask enqueues 2 Ask messages to Pt, one for each involved term: 

- Pt:ASK{Pc, qll, 61, {t, a2, 62, c2, 61}) 

- Pb:ASK(Pc, ql2, 63, {t, a2, 62, c2, 63}) 

It then persists the corresponding log object. The updated log is: 

Pc log 

(Pb,<?9,c2,2,{{gll,gl2}}) 

- Pc:ASK(P6,glO,c3, {i,a2,62,c3}) 

Since there are no incoming hyperedges in c3 a Tell message results: P(,:TELL(glO, 7(c3)). 

- P6:TELL(glO,7(c3)) 

Tell finds the object in the log and updates it. The updated log is: 

Pb log 

(P„,q6,62,l,{{g9,7(c3)}}) 

- P(,:Ask(Pc, gll, 61, {t, a2, 62, c2, 61}) 

There are two incoming hyperedges in 61, but the one having c2 in the tail generates no Ask messages. The 
only Ask enqueued is therefore: 

- Pc:ASK{Pt, ql3, cl, {t, a2, 62, c2, 61, cl}) 

It then persists the corresponding log object. The updated log is: 

Pb log 

(P„,q6,62,l,{{g9,7(c3)}}) 
(Pc,gll,61,l,{{ql3}}) 

- P6:Ask(Pc, ql2, 63, {t, a2, 62, c2, 63}) 

Since there are no incoming hyperedges in 63, it results: Pc:Tell((J'12, 7(63)). 

- Pc:TELL(ql2, 7(63)) 

Tell finds the object in the log and updates it. The new log is: 

Pc log 

(Pfc,<?9,c2,l, {{qll, 7(63)}}) 

- Pc : Ask(P6, gl3, cl, {t, a2, 62, c2, 61, cl}) 

Since there are no incoming hyperedges in cl. Ask issues Pb:TELL((jl3, 7(cl)). 

- P6:TELL(ql3,7(cl)) 

Tell finds the object in the log and updates it. The new log is: 

Pb log 

(P.,q6,62,l,{{g9,7(c3)}}) 
(Pc,gll,61,0,{{7(cl)}}) 

There are no more open calls in the updated log object, therefore the answer to the query gll can be computed 
as 7(cl). Then the object is permanently deleted from the log and the message Pc:TELL(gll, 7(61) U 7(cl)) is 
issued. 
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- Pc:TELL(qll,7(bl)U/(cl)) 

Tell finds the object in the log and updates it. The updated log is: 

P. log 

(P5,g9,c2,0, {{/(&!) U7(cl), 7(63)}}) 

There are no more open calls in the updated log object, therefore the answer to the query g9 can be computed. 
Then the object is permanently deleted from the log and the message P6:TELL(g9, [(7(bl)u7(cl))n7(fe3)]U7(c2)) 
is issued. 

- Pi,:TELL(g9, [(7(&1) U 7(cl)) n 7(63)] U 7(c2)) 

Tell finds the object in the log and updates it. The updated log is: 

Pb log 

(Pa, <?6, 62, 0, {{[(7(61) U 7(cl)) n 7(63)] U 7(c2), 7(c3)}}) 
There are no more open calls in the updated log object, therefore the answer to the query qQ can be computed. 
Then the object is permanently deleted from the log and the message Pa:TELL(g6, [X n7(c3)] U7(b2)) is issued, 
where 

X = [(7(61) U 7(cl)) n 7(63)] U 7(c2) 

- Pa:TELL(g6, [X n 7(c3)] U 7(62)) 

Tell finds the object in the log and updates it. The updated log is: 

Pa log 

(Pa,ql,t,l,{{?2,7(a3)}}) 

(Pa, q2, a2, 0, {{7(63)}, {7(61) U 7(cl) U 7(c2), [X n 7(c3)] U 7(62)}}) 
There are no more open calls in the updated log object, therefore the answer to the query q2 can be computed. 
Then the object is permanently deleted from the log and the message Pa:TELL(g2, 7(a2) U 7(63) U (Y n Z) is 
issued, where 

Y = 7(61) U7(cl) U7(c2) 
Z = [Xr\ 7(c3)] U 7(62) 

- Pa:TELL(g2, 7(a2) U 7(63) U (F n Z)) 

Tell finds the object in the log and updates it. The new log is: 

Pa log 

(Pa, Ql, t, 0, {{7(a3) n (7(a2) U 7(63) U (F n Z))}}) 
There are no more open calls in the updated object and ql Tp^ . Therefore, ql must be a user (external) 
query. 

The Query procedure will realize that ql is complete, and return the answer to the user, thus concluding query 
evaluation. 
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